Some General Results of Elementary Sampling Theory 
for Engineering Use 

By PAUL P. COGGINS 

EVERY day we base conclusions on the results of the process 
commonly known as "sampling." For example, if five times 
in a week a man has waited ten minutes or more for his trolley at a 
street corner, he may conclude that the transportation facilities are 
poor. Or again, if a housewife has bought ten loaves of bread at a 
certain store and has found five of them not as fresh as might be 
desired, she decides that in the future she will buy her bread elsewhere. 
Both of these conclusions are based on an intuitive application of 
sampling theory. Such examples could be multiplied indefinitely. 

Similarly, in most engineering problems, observational data are 
involved in one way or another. In order to be able to assign 
the proper significance to these data, it is essential to have some 
idea as to their reliability, that is, to what extent they represent all 
the facts under consideration. First, the measurements themselves 
may be in error. In the second place, although the observations 
may have been made with perfect precision, they may be incomplete; 
they may constitute but a "sample" of a large group of possible 
observations. The problem considered in this paper is one of this 
second class, generally known as "sampling" problems. 

Assume the existence of a total group or "universe" of N objects 
and that observations have been made on a certain number n of 
them with reference to a particular characteristic. This number n 
we will call the "sample." From this sample we wish to deduce 
some estimate concerning the probable condition of that universe 
with reference to the characteristic observed. 

Now the characteristic observed may itself take on one of two 
forms. It may be either, (1) present or absent; (2) quantitative. 
For simplicity in discussion we may call the first, "Sampling of 
Attributes," and the second, "Sampling of Variables." 

An example of each will be cited from the telephone field. 

Example 1: Sampling of Attributes 

Suppose that 4,000 relays of a particular type constitute a day's 

output. In order to determine roughly what proportion of these are 

non-operative at a current of 12 mils, a sample of 500 relays is tested 

and out of this sample 10 fail to operate at the required current. In 
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the sample, then, two per cent of the relays were defective. What, 
then, is the probability that the percentage of the 4,000 relays having 
this defect is between one and three per cent? Or what is the proba- 
bility that the percentage of defectives in the universe of 4,000 does not 
exceed four per cent ? Or again, if we wish to be practically certain that 
among the 4,000 relays not more than two per cent are defective in 
this respect, how many defectives would be allowable in a sample of 
200? or a sample of 1,000? Any number of questions of this sort 
can be asked and may be answered on the basis of the proper assump- 
tions by sampling theory. 

Example 2: Sampling of Variables 

An office serves 5,000 subscribers lines. Measurements of the 
insulation resistance are made on 200 of these, selected at random, 
and the resulting values tabulated. They vary all the way from 
12,000 ohms to 200,000 ohms. What conclusions may be drawn as 
to the probability that more than a certain number, say 20 of the 
subscribers' loops out of the 5,000, have an insulation resistance of 
less than 18,000 ohms? What is the most probable distribution of 
the insulation resistances for the office as a whole? What is the 
probable error of the average of the observations as a measure of the 
average loop insulation resistance for the office? 

As before, much information regarding the universe may be inferred 
from a properly chosen sample, always, however, with some degree 
of uncertainty. This uncertainty, so far as the sampling process is 
concerned, naturally decreases as the size of the sample increases, 
and, of course, disappears except for inaccuracies of measurement, 
when the sample becomes coextensive with the universe. 

The respective treatments of these two types of problems differ 
considerably in detail. The basic principles are, however, essentially 
the same, and involve in each case the notions of "a posteriori" 
probability, as discussed in most of the standard textbooks on the 
theory of probability. 

In both problems there are certain observations. By means of 
these we desire to obtain as precise information as possible concerning 
some one or more characteristics of the universe from which these 
observations or samples were drawn. The true nature of the universe 
is to some degree, at least, unknown. Certain hypotheses concerning 
it may, however, in the light of the sample be more probable than 
others. What we wish to estimate is the probability that either a 
particular hypothesis or a group of mutually exclusive hypotheses 
includes the true one. 
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This article will be devoted to the type of problem termed "Sam- 
pling of Attributes." l In it are included results from an extensive 
series of computations in the form of charts which may be of value 
in the solution of practical engineering problems. The nomenclature 
is general, so as to be applicable to a wide variety of practical problems. 
For convenience in discussion we shall divide the units of any sample 
into the two mutually exclusive classes, "defective" and "satis- 
factory." The following notation will be used: 

N = total number of items in universe, 

n = total number of items in sample, 
X = number of defective items in universe (unknown), 

c = number of defective items in sample (observed), 

w{X) = a priori probability that the universe will contain 
exactly X defectives, 
W{X\, Jo) = a posteriori probability that the universe contains a 
number of defectives X such that X x ^= X 2s X 2 . 

It is of extreme importance that, at the outset, the significance of 
the symbol w(X) in sampling problems be clearly defined. It is a 
measure of the probability, before the sample is taken, that the lot or 
universe in question contains X defective items and N — X satis- 
factory items. It may be based on previous samples, or the reputation 
of the manufacturer producing those items, or on any one or more 
of a number of other pertinent data. For example, even before a 
sampling inspection, we should unhesitatingly say that in a lot of 
1,000 relays sent out by a reputable manufacturer it is very much 
more likely, a priori, that the lot will contain less than 100 relays 
with a short-circuited winding than that the lot will contain more 
than 800 relays defective in the same respect. We should probably 
find ourselves in a quandary, however, if we attempted to state 
without a sample inspection, the relative likelihoods of 3, 4, 5, 6, 
•■•, etc., defectives existing in the lot. w(X) is a function whose 
numerical value is assumed to state this a priori probability. The 
extent to which we are able to make use of this function, then, depends 
on how precisely we are able to assign numerical values to it before 
we study our sample. 

1 This general type of problem has been under study within the Bell System for 
some time. In an article "Deviation of Random Samples from Average Conditions 
and Significance to Traffic Men" by E. C. Molina and R. P. Crowell which appeared 
in the Bell System Technical Journal for January, 1924, a special case of sampling 
theory was developed and various possible applications were suggested. In August, 
1924, Molina delivered a paper entitled "A Formula for the Solution of Some Prob- 
lems in Sampling" before the statistical section of the International Mathematical 
Congress in Toronto, Canada. This paper dealt with a somewhat more general 
case of the sampling problem than was discussed in the article just mentioned. 
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It may be helpful at this point to state and solve a simple problem, 
which will serve to bring out the fundamental principles involved. 
An urn is known to contain 10 balls, some of which are white and the 
others black. Five balls have been drawn and not replaced. Of 
these five, one is white and four are black. What is now the proba- 
bility that the urn originally contained just one white ball and nine 
black? Two white and eight black? 

Before we proceed to obtain a solution for this problem we have 
to make some assumption, based on knowledge available before the 
drawings were made, concerning the probability that the urn contains 
black and white balls in any given proportion. 

Consider two such assumptions — 

(a) All proportions are a priori equally likely, i.e., before the 
drawings it is as likely that three whites and seven blacks were put 
in the urn as six whites and four blacks, etc. 

(b) The urn was filled with ten balls drawn at random from a bag 
containing a very large number of balls of which a quarter are white 
and the remainder are black. 

There are, before the drawings, 11 possible hypotheses concerning 
the contents of the urn. They range from whites and 10 blacks 
to 10 whites and blacks, as listed in the two left-hand columns of 
Table I and shown in Fig. 1. The probability in favor of each of 
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Fig. 1. The upper curve shows two different assumptions concerning the 
a priori probabilities, while the lower pair shows the a posteriori probabilities. In 
both cases the dots refer to the hypothesis of uniform a priori probability while the 
circles refer to the assumption that the urn itself is a random sample from a large 
stock of which one fourth of the balls are white. 

these hypotheses is the "a priori existence probability" in favor of 
the hypotheses, and is represented by the symbol w{X), X referring 
to the number of white balls assumed to be in the urn. 
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Under assumption "<z" (Case 1) each hypothesis has a probability 
of 1/11 or .090909. Under assumption "b" (Case 2) the probability 
that the urn contains X whites and 10 — X blacks is the binomial 

£)«)*Gr- 
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In the column headed px we give the productive probabilities for 
both cases. These are the probabilities that five drawings from an urn 
whose contents were as given by the corresponding hypothesis would 
yield the observed one white ball and four black. These are zero in 
the case of X = 0, 7, 8, 9 and 10 since urns so constituted could not 
have given the observed drawings. 

For the other cases, the productive probability is the ratio 



10 

5 

In this expression the denominator is the total number of combi- 
nations of 10 balls taken five at a time, and the numerator is the 
number of ways of selecting one out of X white balls and four out 
of the remaining 10 — X black balls. These figures are tabulated in 
Table I under the heading p x - 

We now have all of the component parts of our problem under the 
two different assumptions "a" and "6." It only remains to apply 
"Bayes' Rule." 2 Now the generalized Bayes Rule tells us that the 
a posteriori probability, Px, in favor of an hypothesis after the drawings 

2 This rule was first enunciated by an English cleric, Bayes by name, in a memoir 
in Philosophical Transactions for 1763. It was generalized by Laplace in 1812 to 
cover cases not equally likely. 
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have been made and taking account of the a priori information is 
given by the ratio 

p w(X)p x 

"x = ^=r- 



the summation in the denominator being extended over all possible 
cases. 

The numerical values of this ratio are shown in the last two columns 
of Table I, corresponding to the two assumptions in our problem 
and also by the circles in Fig. 1. That we should have a different 
set of results corresponding to the different assumptions is to be 
expected. It is interesting, however, that the difference in this case 
is by no means great as Fig. 1 brings out. 

If after each drawing we had replaced the ball drawn, we would 
have used for the productive probability px the binomial term 

. 5\/ A'V/10 - X^ 4 
px = 



i / \ 10/ \ 10 

since the successive drawings would not have changed the relative 
constitution of the urn. The same would also be true if the urn 
contained an indefinitely large number of balls with the same relative 
proportions of black and white. 

Now if we agree that a white ball corresponds to a defective item 
and a black ball to an acceptable item, we are immediately able, 
by the use of these fundamental principles of a posteriori probability, 
to write the general basic formal relation 

TI//V V \ X—X \ ° I \ n l 



As we have just indicated, the troublesome element in this formula 
is the function w(X) to which, in many practical problems, it is 
difficult to assign any particular numerical values. In order to 
proceed further, therefore, without detailed consideration of various 
specific engineering problems we are forced to make some rather 
general assumptions concerning the nature of the function w(X). 

Case I 
One of the most natural assumptions to make when no knowledge 
exists to the contrary is that w(X) is a constant within that range 

3 It should be noted that in his original treatment of this formula Molina used 
.S instead of 2) as the symbol for summation on account of the fact that finite inte- 
gration entered into his analysis. Since in this presentation we are dealing only 
with summation, we shall use the commoner form 2 to denote summation. 
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of values of X which essentially affects the value of the denominator 
of (1). This assumption may seem at first glance rather arbitrary 
and wide of the mark, especially since the range of values which 
essentially affects the value of the denominator in (1) depends on the 
value of c obtained from the sample. However, if the sample is 
reasonably large, consisting of 100 items or more, and the proportion 
of defectives observed is small, say 10 per cent or under, the probability 
that universes having a proportion of defectives widely different 
from the one observed would yield such results is so small that a 
wide range of assumptions concerning the a priori probability of 
such universes existing makes very little change in the final result. 

Applying, then, this assumption analytically to the basic formula 
(1) we obtain the simpler formula? 

W(Y Y^ ^x\c)\n-c) . J^LwU-C/ 

H/(A„A 2 )- _ -- , 7V+1X 

be \c)\n-c) \n + \ 

and by means of a transformation outlined in the Appendix we obtain, 
from (2), 

Zl\(Xi\(N+\-Xx\ _ (X*+l\( N-X 2 Y\ 

w(Xi< x 2) - ULlA n+1 -',lll ' l^iizdl . (2fl 

n+l 

Formula (2a) is the one embodied in the paper referred to in footnote 
1. While apparently less simple than (2), it is actually easier to 
compute when c is less than the range A~ 2 — X\. 

When in (2a) we set Xi = c and X 2 = X the resulting formula 

X + 1 \ / N - X 



*=o \ * J\n + l-t 



Ifu, A', ;/, X) = 1 - '■ '' /jy+n ' (3) 

n + l) 

which is at the basis of our computational work, shows explicitly 
certain properties which are not apparent in (2). Various analytical 
transformations and approximations based on this formula lead to 
several interesting extensions which are discussed in the Appendix. 
We shall leave these phases of the problem for the present, however, 
and discuss the results of the calculations which have been made as 
presented on the attached charts. 
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Charts A 

Charts A have been prepared by means of exact formula (3) to 
show, for universes N = 300, 500, 700 and 900 from which samples, 
n, of various indicated sizes are assumed to have been drawn, the 
probability or "weight" W(c, X) as ordinate versus X as absicissa 
for various values of c as indicated by the solid curves so designated. 
The dotted curves crossing these solid curves show the weight indicated 
by various values of the difference "d" between the percentage 
observed defective and the percentage assumed defectives in the 
universe. 

As examples illustrating the interpretation of Charts A consider 
the following: 

Example 1: From a universe of N = 700 items a random sample 
n = 300 items has shown c = 3 or 1 per cent defectives. What is 
the probability or weight to be associated with the hypothesis that 
the universe contains not more than X = 14 or two per cent defectives? 
From the A Chart corresponding to N = 700 and n = 300 we find 
the c = 3 curve (shown heavy because it is an even per cent of the 
sample n = 300). On this curve corresponding to an abscissa of 
X = 14 we read our desired result as the ordinate W = .94. We 
note that this is also a point on the d = 1 per cent dotted curve 
since 100(X/N — cfn) per cent = 1 per cent. 

Example 2: We are going to make a sample of n = 199 items out 
of a universe of N = 500 items and wish the weight or probability to 
be .9 or better that the universe does not contain more than five 
per cent defective items. What is the maximum number of defective 
items that we may tolerate in our sample? Now five per cent of 
N = 500 is X = 25. Corresponding to an abscissa X = 25 and an 
ordinate W = .9 we locate a point which lies between the c = 6 and 
c = 7 curves. We could, therefore, accept the lot provided the 
sample showed six or less defectives, or three per cent or less defectives. 

These Charts A are fundamental in nature, and involve the five 
variables, N, n, X, c and W. The formula by means of which they 
were computed is exact on the basis of the assumptions. Such errors 
or irregularities as may appear to exist in them are of negligible 
practical importance in view of the nature of the assumptions made, 
and are mainly due to the difficulties in drafting such a family of 
curves. 

Naturally a function involving several variables may be represented 

graphically in many different ways, some of which may be more 

convenient than others to use in connection with various practical 

problems. One of the restrictions often encountered in practical 

3 
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problems is that the weight W shall not be less than some specified 
figure which may be considered to give us the desired degree of confi- 
dence in the efficacy of our sampling procedure in weeding out defective 
lots. Charts B and C are drawn up on the basis of three such specified 
figures which are of practical interest, W = .75, W = .9, and W 
= .99. Such restrictions enable us to show, without the large amount 
of labor which would be required without them, the results of calcu- 
lations for a wider range of the other variables. 

Chart B 

Chart B shows roughly for the proportion of observed defectives 
c/n = .01, .04, and .07, the proportion of defectives in the universe 
which we may expect not to exceed with weights W = .75, .9 and .99 
for various values of the sample n as abscissa and for N = 300, 500, 
700, 900 and also the limit approached as N becomes infinite. This 
form of presentation serves to relate the present material to the 
earlier charts which accompanied the earlier article already mentioned 
as having appeared in the Bell System Technical Journal for January, 
1924, and shows how with a given size of sample n and a given pro- 
portion of defectives observed, the larger the value of the universe N, 
the larger the variation which may be expected with any given degree 
of probability. As would be expected, we also see that when the 
size of the sample approaches the size of the universe, the range of 
uncertainty approaches and our sample inspection becomes a 
complete inspection. 

It will be noted that, up to the present point, we have not considered 
cases for N > 1,000. The exact formulas become rather troublesome 
to compute for these larger values of N. Fortunately, however, 
various approximate methods outlined in the Appendix become suf- 
ficiently accurate to be of service in these cases. 

Charts C 

We have, therefore, by their aid when N > 1,000, prepared the 
Charts C which we believe will cover a rather wide range of the 
variables with sufficient precision to be of considerable practical 
value. The points shown by dots are believed to be accurate to the 
degree to which they are readable on the chart. For intermediate 
values and for other values of the trouble limit the discrepancies are 
indicated on the charts. One of these charts corresponds to each of 
the three following weights, W = .75, W = .9, and W = .99. As 
abscissa we show the per cent sample, 100 n/N. The ordinate scale 
is proportional to the number of items n in the sample. The same 
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proportionality factor K enters also in the ratio X/N which we desig- 
nate as the trouble limit. We shall later discuss the purpose of this 
factor K in more detail. The understanding of the charts will be 
simplified, however, if we consider the case for K — 1 in which the 
charts become direct reading for the case of a trouble limit X/N = .01 . 

The values of c, the number of defective items observed in the 
sample, are shown as a family of curves marked c = 0, c = 1, c = 2, 
etc., sloping downward from left to right. Any point on the c = 5 
curve, for example, on the Chart C for weight W — .9 shows the 
corresponding values of n as ordinate and n/N as abscissa which are 
necessary in order that this number of defectives may be accepted 
with a degree of assurance 4 indicated by W = .9 that the true pro- 
portion of defectives in the universe N is not greater than .01. 

It will be readily noted that for every value of the universe N, 
there may be drawn a diagonal straight line through the origin whose 
ordinate for an abscissa of 100 per cent sample is equal to n = N. 
Certain representative N lines are drawn in on the charts in this 
manner, and as many more could be inserted as desirable. Thus, 
for a constant value of W and a constant value of X/N we have 
provided on Charts C a ready means of determining the relationships 
which must exist between the remaining variables N, n, and c. 

As an example of the use of these charts for the case where K = 1 , 
i.e., for X/N = .01, consider the following: 

Example 3: In a sample of n = 900 out of a universe N = 3,000, 
what is the maximum number of defectives c that we may accept 
with an assurance of W — .9 or better that the true proportion of 
defectives in the universe is not greater than .01? 

Referring to the Charts C for W = .9 and considering K = \, 
we locate the point corresponding to an abscissa of 100 n/N per cent 
= 90,000/3,000 = 30 per cent, and an ordinate n = 900. We find 
that this lies on the diagonal straight line marked N = 3,000 K as 
it should and that it also lies between the c — 5 and c = 6 curves. 
From this we may infer that we may accept five defectives but not 
six in the above case. 

We shall now proceed to explain the significance of the factor K 
and the cross-hatched areas beneath the c = 0, 5, 10, 15, etc., curves. 
The purpose of these features is to extend the application of Charts C 
to values of X/N other than .01. It may be noted from the mathe- 
matical analysis or from actual plotting of charts similar to Charts C, 
but for different values of X/N, that the general shape and spacing 
of the curves remains practically unchanged for any given value of W. 

* This statement is not strictly true when we are dealing with non-integral values 
of X. In such cases the weights W shown on the Charts C are slightly too high. 
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In other words, the value of W(c, X) depends mainly on the ratio 
n/N, and the values of X and c, and only in a secondary way on the 
absolute values of n and N. This being the case, if we make a given 
per cent sample of two different universes N and KN, the number of 
defectives c which we may allow in our sample out of the first universe 
N in order that our weight W may have a given value, .9 say, for the 
true proportion of defectives in this universe to be not greater than 
.01 is practically the same as the value of c that we may allow in the 
sample out of the second universe KN for the same weight W and a 
proportion of defectives .01 jK. For values of K > 1 there is no 
appreciable change introduced in the location of the c curves on 
Charts C. For values of K < 1, some error is made. The magnitude 
of this error is indicated by the cross-hatched bands on the c = 0, 
5, 10, 15, etc., curves. The lower boundaries of these bands were 
calculated to show the magnitude of the error introduced for the 
corresponding values of c when K = A. The upper boundaries of 
these areas correspond to values of K S 1. For other values of c 
only the upper boundaries of the corresponding bands are shown, 
the lower boundaries being easily deducible by visual interpolation 
to a sufficient degree of approximation for most practical purposes. 

As examples which may serve to illustrate this sort of application 
of Charts C consider the following: 

Example 4: A sample of n = 5,000 items has been drawn out of a 
universe of N = 20,000 items and c = 15 defectives were observed. 
May we assume with a weight W = .9 or more that the true proportion 
of defectives or trouble limit X/N is .005? 

Here .01 jK is to equal .005 for our charts to apply. Therefore, 
K = 2. Our sample n = 500 = 2.500X and our per cent sample is 
100 n/N = 25 per cent. Corresponding then to an abscissa of 25 
per cent and an ordinate of 2,500iiL on the W = .9 chart we locate a 
point between the c = 19 and c = 20 curves. We could have allowed, 
therefore, c — 19 defectives at the desired weight and trouble limit. 
Since we observed a smaller number of defectives than was allowed, 
our weight W is therefore greater than .9. As a matter of fact it is 
practically only slightly less than .99 as appears from the W = .99 
chart when utilized in a corresponding manner. 

Example 5: As our next example we shall attempt to determine 
what is the trouble limit which corresponds with W = .9 to the 
results of the sample of Example 4. On the W = .9 chart corre- 
sponding to an abscissa of 25 per cent we read from the c = 15 curve 
an ordinate of 2,01 5K. But this must be our sample n = 5,000. 
We, therefore, determine K from the equation 2,01 SK = 5,000 which 
gives K = 2.48. Hence, our corresponding trouble limit is 
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.01 .01 



K 2.48 



= .0040. 



So far our values of K have been greater than unity, so we have not 
had to consider our cross-hatched bands at all. In our next example 
we shall remedy this defect. 

Example 6: What number of defective items c may be allowed in a 
sample of 200 items out of a universe of 500 items so that W — .9 
corresponding to a trouble limit X/N = .08. Here .01 /K = .08 
.•.#=.125. Our ordinate, therefore, is 200 = 1.6002C and our 
abscissa is 200/500 X 100 = 40 per cent sample. The point corre- 
sponding to this on the W = .9 chart lies just below the c = 12 curve 
indicating at first glance that we could not accept 12 defectives in 
such a case. However, we note that K = .125 should be near the 
lower boundary of our cross-hatched band for c = 12 if such a band 
had been drawn in. From an inspection of the widths of the bands 
for c = 10 and c = 15 we correctly infer that our point determined 
by the 40 per cent sample and 1.6002C would lie well within this 
band, and that after all we could accept 12 defectives in the example 
in question. 

This example has been included merely to illustrate the interpre- 
tation of the bands shown on Charts C. It may be anticipated that 
in many if not most of the practical engineering problems only the 
upper boundaries of the bands need be used to obtain a degree of 
accuracy commensurate with the precision of the results desired and 
the applicability of the basic assumptions concerning randomness 
and the form of the a priori existence probability w(X) . 

If it should be desired to extend the range of these charts to cover 
values of W other than those shown, this may be done by means of 
the methods outlined in the mathematical analysis, the particular 
method to be used depending on the degree of precision required. 

The preceding pages have contained an outline of some of the 
theory and results based on the assumption that, within a range at 
least, all possible values of X, the unknown number of defectives, 
were a priori, that is, before the sample in question was made, equally 
likely. This assumption we mentioned as appropriate to consider in 
case we have no information to the contrary. The results may be 
also applicable to certain cases where we do have some information 
of a general sort, but which it is difficult to express analytically. 
However, it is by no means the only reasonable assumption to make 
concerning the form of w(X) as it enters into the basic formula (1). 
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Case II 

Another assumption is suggested by the following considerations 
which enter into many of the standard works on probability theory. 
Assume that the lot or universe in question was itself drawn at random 
from an extremely large stock or major universe in which the pro- 
portion of defective items was p. Under these conditions the a 
priori probability w{X) that our universe of N items would contain 
exactly X defective items would be given by the expression 



«(*) =(£)**(! -P)"-*. 



Using this expression for w(X) in the fundamental formula (1), 
we obtain, by a process given in detail under the heading Appendix, 
the formula 



py 



w{x it x,) - *Z(x - n c ) px ~ c{1 ~ 

which for X\ = c and X 2 = X reduces to 

w(c, x) = x; ( N ~ n ) pki - p)"— 1 , 

which is precisely the expression for the a priori probability that the 
remaining N — n items which we did not inspect contain not more 
than the X — c defectives which together with the c we have observed 
would assure us of a satisfactory universe. 

In this form W(c, X) turns out to be a simple binomial which, 
when N — n is large and p is small, may be reasonably approximated 
by the Poisson Exponential Binomial Limit for which extensive curves 
and tables already exist 5 and will, therefore, not be included in this 
article. 

In order to make practical use of the results of this assumption, 
we must have some knowledge of the appropriate value of the factor 
p in any given case. This factor should measure the probability 
that an item, selected at random, will, on inspection, prove to be 
defective. If a large number of tests have been made in the past 
on similar items, prepared by essentially the same process, the ratio 
of the total defectives observed to total items inspected in such tests 
may be a reasonable figure to use for p. In the case of many manu- 
factured articles such a ratio ought not to be very difficult to obtain. 
In certain cases it might be necessary to allow for such factors as 

6 See article, "Probability Curves Showing Poisson's Exponential Summation," 
by George A. Campbell, Bell System Technical Journal, January, 1923. 
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trend, improved process of manufacture, changes in personnel and so 
on. In the final sampling of such complicated equipment as is 
involved in a completely installed telephone central office it may be 
necessary to take account also of breakage and such troubles due to 
shipping and setting up of the equipment which may introduce 
marked deviations from average conditions. 

Such general considerations as these should determine whether or 
not we can safely assume any given value for p, and if so, what value. 
It will be evident on a little consideration that the assumed value 
for p need not be extremely precise for many practical applications. 

Concerning the restrictions on the function giving the a priori 
probabilities, w(X), it may be well to point out that this function is 
only defined for the positive integral values of X such that ^ X ^ N. 
Moreover, since probabilities are essentially positive, it cannot be 
negative for any of these values of X. Also since the composition of 
the lot is certainly comprised in this range of values of X, one has 

X>(X) = 1. 
o 

The questions raised concerning the form of w(X) are of particular 
importance in connection with the economic phases of sampling, 
that is, the relative costs of having satisfactory lots rejected and 
unsatisfactory lots accepted by the sampling process. These costs 
are, of course, dependent on the frequency with which given propor- 
tions of defects occur in the lots in practice, and a detailed consideration 
of these would itself warrant a separate treatment. 

It is felt that the general methods outlined in this treatment, while 
not sufficiently detailed for immediate practical application to many 
of the problems in sampling of attributes, will nevertheless serve as a 
satisfactory basis for further work of a more specific nature. 

APPENDIX 

Case I: Assuming w(X) is a constant and noting that G 
X=N^n+c /x\/ N - X\ / N + 1 



X=c \ 

the fundamental formula 



n — c I \ n + 1 



X\/N -X 
c J\ n — c 

N - X 



W(X t , X 2 ) = x *~f^ c X / v \ / a/ _ v \ (1) 



6 Netto, "Lehrbucli tier Kombinatorik," p. 15, Eq. 11. 
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gives us, as one computing formula, 



*»/X\/N-X 

!; V , \ = -v=*. WV n-c , 
' N + 1 x 
n + 1 

which is fairly manageable so long as the range Xi to X 2 is not too 
great and we have tables for the logarithms of the factorials involved. 
When X 2 — X\ is large compared with c, one may use the equivalent 
formula 

'fH/XA/N+1-XA _ /* 2 +l\/ N-X 2 

^xo- sLla * +1 ~'J + ~' ' tozk . (2fl) 

This transformation may, as Molina has shown, be effected as 
follows: 7 

£'(X : f) -"S*(X : ? ) -^(X : f)- 

X-JTl \ C / \ n C I X=Xi \ C I \ n C I X=Xt+l \ c I \ n c I 

Now 

x=%?+c/X\/N-X\ x= ^ n+c /N-X\/ £f /X 2 +l\/X-X 2 -l 



£fi \c)\n-c)~ X J£ (w)(§ 



A'=X 2 +1 \ ° / \ " " / X=X- 2 +l \ " " / \ |-0 * * '\ C * 



Likewise 






x->^*+c/X\/N-X\ = &/XA/N+ 1 -Xi 
xh, \c )\n-c ) f? \ t )\ n + l-t 

If in (2a) we let X\ = c and X 2 = X, we obtain 



'"" 'X + 1\/N+ 1 -X + 1 



from which we may compute WpTi, X 2 ) from the equation 
W(Xi, Xi) = W(c, Xt) - W{c, Xi - 1). 

7 See also Netto, "Lehrbuch der Kombinatorik," p. 12, Eq. 6; p. 15, Eq. 11. 
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A direct interpretation of the expression for W(c, X) gives us the 
following interesting 

Theorem A: The a posteriori probability that a universe of N items 
contains not more than X defectives when c defectives have resulted 
from a random sample of n items is equal to the a priori probability 
of obtaining at least c + 1 defectives in a random sample of n + 1 
items from a universe of N + 1 items of which exactly X + 1 are 
defective. This theorem assumes that a priori all values of X are 
equally likely. 

Writing 5 = X + 1 — /, we obtain from (3) 



X + 1\/N+ 1 - X + 1 



— .*. — 

l-ITfeJO-l- ~° V J /n+1~ H ~ S ' ' (4) 

N - n 

The right-hand side of this equation is exactly what would have been 
obtained directly from (3) if we had been dealing with a sample of 
N — n — 1 instead of n and had observed X — c defectives instead 
of c, since the particular symbol chosen for the variable of summation 
is immaterial. 

This fact, which follows immediately from physical consideration of 
the equivalent a priori problem of Theorem A , may be stated as 

Theorem B: If we calculate the probability W that a universe of N 
items contains not more than X defectives when a sample of n has 
shown exactly c defectives, then 1 — W is the probability that a 
universe of N items contains not more than X defectives when a 
sample of N — n — 1 has shown exactly X — c defectives. 

In making extensive calculations, this relation will serve to cut 
down the amount of computation considerably, as each calculated 
value of W may be made to do double duty. For a single calculation 
either (3) or (4) may be used depending on which involves the shorter 
summation. 

Another interesting relation also appears when we note that 

X + l\/ N-X \ /«+l\/ N-n 
t )[„ + !- t) = [ t )\X + 1 -/ 
/ N+ 1\ (N+l\ 

\ n + l) \X + l) 

which may be proved simply by cross multiplication of the combination 
factors, writing them in terms of factorials. From this we see that 



/=0 \ t J \ X + 1 - / 



T 

X + 1 



W(Ci X ) - ] - 'jv+V ^ 
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If we compare the right-hand sides of (3) and (5), we see that n 
and X have simply been interchanged, which proves the rather 
interesting 

Theorem C: The probability W(c, X) that a universe of N items 
does not contain more than X defectives when a sample of n has 
shown exactly c defectives is equal to the probability W(c, n) that a 
universe of N items does not contain more than n defectives when a 
sample of X has shown exactly c defectives. 

Thus equations (3), (4) and (5) taken together show that we may 
make three different interpretations of a single calculation. 

Up to this point, all of the analysis has been exact on the basis of 
the fundamental assumptions. We may now proceed with advantage 
to consider some approximate relationships which have been for 
some years of service in the calculation of practical curves and tables 
for cases where the values of N, n, or X were too great to be handled 
conveniently by means of the exact formulas. 

Now consider in formula (3) a single term, ir t say, where 

/ N - X 
X + 1 \ \ n + 1 - t 



7T, = 



/ / ( N+ 1 

n + 1 

(n+1)! \/ (N - n)\ 



X + 1 \ \(w + 1 - t)\/\(N - n - X - 1 + t)\ 
I ) ({N+ lj! 

X + 1 \ / n \ t , 



(N - X) ! 



where the form of the function F is to be determined. 

To facilitate the consideration of this function we may split it up 
into three similar parts as follows: 

where 

*«+>•<-!) -(i + i)(.)H)-(i-^ 
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Recalling that 

iog(i+x) =x-\x*+\x*-\x*+ ••-, 

which converges for X- < 1, 
we have 

l°**(» + M-l)-lo g (l+i)-g±i(2)', 

lo gv (iV+l,.Y) = log(l+l)-'| , |:7(|) r . 

logx(Ar _„, x _ = _|'gi(_^_y, 

whence 

log F(tf, n, X, t) = log ( 1+ i) - log ( 1+ -^) 

+ §|KJ)'-SSKJ)--S5K^)' 

Neglecting terms of the second order in 1/w, 1/iV and 1/(./V — «), 
we have as an approximation 

log wa, 0— *(— + ^ n — )• 

If now we select as our value of t, t = (n/N)X, we have log F(N, 
n, X, t) = (X - 2)/2N which is > 0; if X > 2, 

F(N, n, X, t) = e(*-2>/2* 

which gives us as an approximate value for the maximum term ir t , 
where / = (n/N)X, 

-'-( i ^)(*) , ( 1 -*P'-~- (7) 

Having this term, it is a simple matter to calculate the other terms 
necessary for evaluating W(c, X) by means of the exact equations 



//-/+! X - t + 1 
1 / + 1 'N-n-X + t 

t N-n-X + t-1 



TT t -i — 7T, 



n - t + 2 X - / + 2 



(8) 



(9) 
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Due to the reciprocal relationship between n and X, we may obtain 
in a similar manner 

when t = (X/N)n. 

It is by means of these relationships that we have calculated the 
cases for N > 1,000 as shown on Charts C and feel that the precision 
obtained is rather better than would have resulted from using formula 
(6) for all values of / and assuming F(N, n, X, t) = 1. However, 
for suitable ranges of the variables involved, the formula resulting 
from this procedure 



t )\N J \ N , 

would be a fairly good approximation. This is simply part of the 
well-known binomial expansion and is far simpler to compute than 
the more precise formulae, although by no means easy at that. 

We may draw several interesting practical conclusions, however, 
from formula (11). For instance, we may note that as n/N approaches 
and X becomes infinite in such a way that the product (X -J- l)(n/N) 
remains constant and equal to the average a, we have the familiar 
Poisson Exponential Binomial Limit 



wb, x) = i - Z 



to— a 



a l e 



<=o *■ 

where a = (X + l)(n/N). 

In addition we note from formula (11) that, for small values of 
X IN, the variable iV enters into the formula only in the ratio n/N. 
From this we deduce the fact, borne out by independent calculations, 
that by means of the proper use of a proportionality factor K applied 
directly to n and N and inversely to X /N we may extend the Charts C 
to care for values of XjN ^ .1 to a very good degree of accuracy 
and with considerable saving in space and computational labor. 

By the reciprocal relationship between X and n as shown in exact 
formulae (3) and (5), we obtain 

which differs only in form from equation (3) of Molina's paper s on 

8 Footnote 1. 



ELEMENTARY SAMPLING THEORY FOR ENGINEERING 45 

the infinite universe case. Formula (12) does not give the same results 
as (11) as it is most exact when n/N is small and becomes absolutely 
exact in the limiting case of an infinite universe where n/N = 0. 
This formula also approaches the Poisson Limit, in this case as X/N 
approaches and n -f- 1 becomes infinite in such a way that the 
product (» + \)(X/N) remains constant and equal to a, say. 

The Poisson Limit, for the case of an infinite universe, was given 
by Molina in the Appendix to the article in the Bell System Technical 
Journal of January, 1924, already mentioned in this memorandum. 

Another point of interest is brought out when we note that in the 
limiting form of (12) the Poisson gives us 

„,, vx . A a l e~ a X-n 

t=c+l l - iV 

and for another pair of values of W and X 

„., v > A ai'e-'i Xi-n 

Wi(c, x,) = 2 —7i— . «i = -*r • 

Thus from properly chosen Poisson curves or tables we may obtain 
the ratio X\/X = a\/a which corresponds to the observed value of c 
and the desired values of W and W\. This ratio in exact formula? 
is a function of N, n, and X also, but for many problems involving 
small values of n/N and X/N the degree of approximation furnished 
by this limiting form is fairly satisfactory and still further reduces 
the amount of labor necessary in extending approximate results to 
practice. 

The sort of procedure we have just been discussing may be facilitated 
by means of a chart on which we show as abscissae values of c and 
as ordinates values of the ratio of X\/X which corresponds to various 
values of W as shown by various curves and a specified value of Wi, 
say W\ = .9. Such a chart would enable us to interpret roughly a 
given Chart C for W = .9 in terms of other values of W. For precise 
work this procedure is not to be recommended, and, therefore, no 
charts of the nature just described are included herein. 

Approximations to the binomial other than the Poisson have been 
discussed in many of the texts. In particular, for values of p in the 
neighborhood of \, the well-known Laplace-Bernoulli integral 



_L f ' 



r*J a ™ 

will serve as an approximate value for Wi where the limits a and b 
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are functions of N, n, X, and c. This approximation is not so suitable, 
however, for most telephone sampling problems in which the pro- 
portion of defectives may be assumed in general to be far smaller 
than ^. 

We shall now proceed to discuss a few points concerning the analysis 
of Case II in which instead of assuming w(X) constant we assumed 
it to be of the form 

w(X) = (£)j*(l -p)»-x. 

Combining this expression for w{X) with the term ( )( _ 
which appears in the basic formula (1), we have 



c\{X-c)\ (n-c)\(N-X-n+c)\ X\(N-X)\ 

-ffl-®#*-#>~-((£3«-*~-; 

Since only the factors in brackets involve the variable of summation 
X, the remainder of this expression will cancel out in numerator and 
denominator, leaving us with 



%\x- n c ) px ~ c{l ~ p)N ~ n ~ x+ 



-n-X+c 



W(X h X 2 )= x -^ n+ 

as the resulting form for (1) with this assumption for w(X). 

It may be noted that the summation in the denominator above is 
a complete binomial (p + q) N ~" and as such equals unity, so 

*=?» / N — n \ 

w(x u x 2 ) - £ [ x _ c )P x ~ c ( l ~ P) N — X+C , 

where p is assumed to be the a priori probability of a defective item 
as determined from reliable information concerning conditions under 
which the items are prepared. 
As before when Xi = c and X 2 = X we have 

w(c, x) = ' = z( N ~ *) *«(1 - />)"—'. 

We may be willing in certain cases to admit the binomial form for 
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w(X) without being able or willing to assign any single value to p. 
In such cases we may, however, proceed to make assumptions con- 
cerning the probability that p has a given value. Let 

s(x,p) =Kp)( k ^)p x (i-P) n - x ; 

(X) = f \(.r, p)dp = (£) f l f(P)P x (l - P) N ~ x dp, 
f(p)dp =1 and £ W ( X ) = L 



then 



where 



f 

Jo 



x=o 



Suppose we assume f(p) constant for all values between and 1; 
we have 

w(X) - (%} JV(1 - P) N ~ x dp = jj^rj , 

which we note to be a constant which assigns to all of the N + 1 
possible a priori hypotheses concerning X an equal weight. This 
pair of assumptions in Case II amounts, therefore, to the same thing 
analytically as the assumption of Case I. 

Any number of possible hypotheses concerning f(p) might be 
made. Some of these would complicate the analysis considerably, 
others might be carried through fairly simply. One of these hypothe- 
ses might fit one class of physical problems, another some other class. 
To consider these all in detail in this paper would be outside of the 
scope of a general treatment. The methods outlined here would, 
however, hold for such extensions. Such difficulties as might be 
encountered would be of an analytical rather than a logical nature. 

In closing, the author wishes to express his appreciation to his 
numerous friends and associates in the Bell System, whose suggestions 
and cooperation have been of material assistance in the preparation of 
this work, and particularly the work of Miss Nelliemae Z. Pearson of 
the Department of Development and Research, under whose direction 
most of the computations were carried out and who has checked 
through the various proofs. 
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Key to the Charts 

The charts present various graphical representations of the function 
W(c, X, n, N), equation 3. This function gives the probability, W, 
that the number of defectives in a lot of N is equal to or less than X, 
after a sample of n units has shown c defectives, assuming that each 
of the possible values of X between o and N were equally likely a priori. 

Charts A: Separate pages refer to different values of n and N as 
labelled. 

Ordinates, W; abscissas, X. 

Solid curves, c; dotted curves (c/n — X/N) expressed as per cent. 

Charts B: Separate groups of curves refer to different values of W 
as labelled. 

Ordinates, X/N; abscissas, n. 

Separate sets of curves in each group refer to different values of 
c/n as labelled. 

Individual curves are for different values of N. 

Charts C: Separate pages refer to different values of W. 

Ordinates, n; abscissas, n/N. 

Separate curves for different values of c. 

Cross-hatching indicates amount of dependence on X/N. For 
fuller explanation see pages 34-37 incl. 
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A' = Defectives in Universe 

N = 300, 11 = 249 

CHARTS A 
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X = Defectives in Universe 
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